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Abstract 

A new classifier for Polarimetric SAR (PolSAR) images is proposed and assessed in this paper. Its input consists 
of segments, and each one is assigned the class which minimizes a stochastic distance. Assuming the complex 
Wishart model, several stochastic distances are obtained from the h-cj) family of divergences, and they are employed 
to derive hypothesis test statistics that are also used in the classification process. This article also presents, as a 
novelty, analytic expressions for the test statistics based on the following stochastic distances between complex 
Wishart models: Kullback-Leibler, Bhattacharyya, Hellinger, Renyi, and Chi-Square; also, the test statistic based 
on the Bhattacharyya distance between multivariate Gaussian distributions is presented. The classifier performance 
is evaluated using simulated and real PolSAR data. The simulated data are based on the complex Wishart model, 
aiming at the analysis of the proposal well controlled data. The real data refer to the complex L-band image, acquired 
during the 1994 SIR-C mission. The results of the proposed classifier are compared with those obtained by a Wishart 
per-pixel/contextual classifier, and we show the better performance of the region-based classification. The influence 
of the statistical modeling is assessed by comparing the results using the Bhattacharyya distance between multivariate 
Gaussian distributions for amplitude data. The results with simulated data indicate that the proposed classification 
method has a very good performance when the data follow the Wishart model. The proposed classifier also performs 
better than the per-pixel/contextual classifier and the Bhattacharyya Gaussian distance using SIR-C PolSAR data. 

Index Terms 

Region-Based Classification, Stochastic Distances, Hypothesis Tests, Polarimetry, Wishart distribution 

I. INTRODUCTION 

The classification of images obtained by polarimetric synthetic aperture radar (PolSAR) sensors is one of the 
main information extraction techniques from that kind of data. Generally, PolSAR classification falls into three 
categories: target decomposition |1|, PolSAR data statistical modeling |2|, and hybrid methods jSj, ||4|, involving 
both the statistical modeling and target decomposition methods. 

Regarding the statistical modeling, the multiplicative model, which takes into account the contributions of 
both the backscatter and the speckle, has been suitably employed. The return can be modeled by the complex 
Wishart distribution ||5|, |[6|. Other models have been proposed in the literature for PolSAR data, markedly the 
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Qp distribution (which has as particular cases the polarimetric Kp and Qp distributions ||2|, ||7J), the Gaussian 
scale mixture |[8|, |[9|, generalized complex Gaussian laws |T0| , and the U and other models stemming from the 
multiplicative hypothesis fTT|-fT4|. These models are more flexible than the Wishart law (they all include the latter 
as particular case), at the expense of employing additional parameters whose estimation is oftentimes cumbersome. 

Several pixel-based classifiers were developed from the Wishart distribution, being one of them the maximum 
likelihood classifier used in |6| and the unsupervised procedure employed in |T5| . Pixel-based classifiers can be 
improved by the use of spatial context. Frery et al. |[2| developed an ICM - Iterative Conditional Modes classifier 
which employs the maximum likelihood classification result under the complex Wishart distribution as starting point, 
point wise evidence, and the Potts model as local information. This classifier quantifies the spatial information by a 
maximum pseudolikelihood estimator in a completely manner as segments do. The Potts model codes the influence 
of neighboring classes (typically a few, in the implementation here discussed were eight) in a parametric way, 
whereas a segment is already expected to be a group of data with similar properties. The ICM algorithm proceeds 
iteratively until convergence, whereas segment classification by distance minimization is a single-step technique. 

It is believed that even better PolSAR classification results can be achieved using segmented images (region-based 
classification). This classification strategy may use a supervised scheme based on stochastic distances between the 
statistical distributions that model segments and training samples which represent classes. In the case of PolSAR 
data, these distances must be defined between pairs of complex Wishart distributions. 

Salicru et al. |T6| developed analytical dissimilarity measures, the so called h-(j) family of divergences. Hypothesis 
tests based on statistics derived from these divergences were also developed in p6| . Frery et al. p7| , (TSj obtained 
five different distances between complex Wishart distributions: Kullback-Leibler, Bhattacharyya, Hellinger, Renyi 
and Chi-Square and their corresponding hypothesis tests were also developed and evaluated. 

A PolSAR region based classifier using the test statistic derived from the Bhattacharyya stochastic distance 
between two complex Wishart models was proposed in p9| . The promising results obtained using this classifier 
in the L band SIR-C image, led us to improve the proposed classifier, by introducing new stochastic distances 
and their corresponding hypothesis tests. In addition to describing in more details the algorithm developed in fT9| , 
this article breaks new ground by presenting analytical expressions for the test statistics based on the following 
stochastic distances between complex Wishart distributions: Kullback-Leibler, Bhattacharyya, Hellinger, Renyi and 
Chi-Square, and also the statistic based on the Bhattacharyya distance between multivariate Gaussian distributions. 
The classifier performance is evaluated using simulated and real PolSAR data. The simulated data is based on the 
complex Wishart model and the symmetric circularity assumption, aiming at the analysis of such application in 
statistically well controlled data. The real data refer to the complex L-band image, acquired during the 1994 SIR-C 
mission. 

II. Stochastic Distances and Associated Tests 

Mahalanobis presented the concept of a distance between distributions in the sense that there are pairs of 
probability laws which are easier to distinguish than others. Such quantities have received a number of denominations 
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as, for instance, measures of separation, measures of discriminatory information and measures of variation-distance. 
Many goodness-of-fit tests, such as the Hkelihood ratio, the chi-square, the score and Wald tests, can be defined 
in terms of appropriate distance measures between distributions. They all have in common test statistics which 
increase as the two distributions are further from each other | |2Q| . 

Salicru et al. |T6| proposed the h-(j) family of divergences as follows. Consider the random variables X and Y 
defined on the same support S with distributions characterized by the densities/x(^; ^i) and /y(x; ^2), respectively, 
where 61 and 62 are parameters. The h-cj) divergence between X and Y is given by 

D^^{X,Y) = h( I 0(M^)/^(x;^2M:r), (1) 

where (j): (0, 00) [0, 00) is a convex function and h: (0, 00) [0, 00) is a strictly increasing function with 
h{0) = and h'{x) > for all x e S. Table |l] presents the choices of h and ^ employed in |21| and the 
divergences they lead to. 

TABLE I 

(h, (/))-DIVERGENCES AND RELATED (f) AND h FUNCTIONS. 



(h, (/)) -divergence 


Hy) 




KuUback-Leibler 


y 


X log(a:) 


Renyi (order < ^ < 1) 


-^log((^-l)y + l), 0<y<^ 




Hellinger 


y/2,0<y<2 




Bhattacharyya 


-log(-y + l),0<y<l 






2//4 


(x- lf{x+ l)/x 



These h-<p divergences are not granted to be symmetric, so they are not necessarily distances. A simple way to 
overcome this is computing 

d';{X,Y)= ^ ' ^ \ (2) 

regardless whether ( • , • ) is symmetric or not. Furthermore, if X and Y obey the same distribution with possibly 
only different parameters, it is enough to write ^2). Doing so, it is granted that ^2) = if and only 

if 61 = 62 and that ^2) > 0, but how big this quantity is has no immediate interpretation. 

Salicru et al. p6| provided a means to transform distances into test statistics with known asymptotic properties. 
Let 61 and 62 be maximum likelihood estimators of ^1 and 62 based on samples of sizes m and n, respectively. 
The parameter space is C R^. Under the null hypothesis Hq : 61 = 62, the test statistic 

converges in distribution to a Xm distributed random variable, where M is the number of parameters of the model, 
provided m, n ^ 00 such that m/{m-\-n) X G (0, 1). 
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Nascimento et al. pT| derived h-(j) tests for the model for intensity SAR data and used them for the 
discrimination of targets in remote sensing images. Cintra et al. p2| compared those tests with the Kolmogorov- 
Smirnov test, and verified their robustness. Frery et al. p7| derived these tests for polarimetric SAR data under the 
Wishart model. These last results will be recalled in the next section. 

III. Tests Based on Stochastic Distances Between Models 

The Wishart law is widely accepted as a model for PolSAR data, mainly on homogeneous areas. This model 
stems from the multilook processing of data which obey the complex Gaussian distribution. 

We may consider systems with q polarization elements, which record the complex Gaussian distributed random 
vector y = {Si S2 • • • Sq)^, where 'T' denotes vector transposition. This distribution is characterized by its 
complex covariance matrix H = E(i/i/*), where '*' denotes the complex conjugate transpose, and E( • ) is the 
statistical expectation operator. In order to enhance the signal-to-noise ratio, L independent and identically distributed 
samples are usually averaged to form the L-looks covariance matrix: 

1 ^ 

z = j^Y.y^y:. (4) 

Under these hypotheses, Z follows a scaled complex Wishart distribution with parameters E and L (denoted by 
Z ~ W(5],L), and characterized by the following probability density function: 

fz{Z;^,L) = ^^jjpl^j^) exp(-Ltr(S-^Z)), (5) 

where r,(L) = 7r«(''-i)/2 ^(^l -i), L>q,r{-) is the gamma function, and tr( • ) is the trace operator. It is 
important to observe that this Wishart distribution satisfies E(Z) = T,. The maximum likelihood estimator of S), 
based on N independent samples, is the sample mean S = iV-1 Eili Zi, and L can be estimated by any of the 
techniques discussed in | [23| . 

Frery et al. (Tl] computed stochastic distances between complex Wishart distributions based on the h-cj) diver- 
gences presented in Table |T| in their most general form (different covariance matrices and number of looks). In the 
following we derive the test statistics for the case of same number of looks L, assumed known. The null hypothesis 
under which these statistics follow a distribution is Hq : T,i = Yl2- 
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5'kl(^i,5]2) = L 



5'b(X)i, 5]2) 



^r(^15 ^2) = 




(6) 
(7) 
(8) 



(3{m + n) I 1 - /3 



[|Si|-'5|S2r'|(/3S; 
+ [\^^f-'\^2\-^\{/3S 



"0^ 



(l-/3)Si 



(9) 



mn 
2(m + n) 

^2' 



abs(|(2S2 



|Si| 



abs(|(2Si 



Si )-^|) 

L 



(10) 



where 'abs' denotes absolute value. Equations ([6]), (|7]), ([8]), ([9]) and ([T0| are, respectively, the Kullback-Leibler, 
Bhattacharyya, Hellinger, Renyi of order /3 and t^st statistics based on stochastic distances. Each test rejects 
the null hypothesis at level 1 — a if Pr(Xg2 > s^) < a, where x^2 follows a distribution with q'^ degrees of 
freedom. 

Notice that Equations ([6|)-(p^ rely on two simple operations on complex matrices: the inverse and the determinant. 

Oftentimes complete PolSAR data are not available. For instance, Radarsat-2 provides the HH, VV, HV and VH 
intensities, while dual polarizations are available from Envisat (HH-HV or VV-VH) and Cosmos Skymed (HH- 
HV or HH-VV). In these cases, only elements of the main diagonal of L-looks covariance matrix Z are provided. 
Multivariate Gamma models for these data can be derived as marginal distribution from the scaled complex Wishart 
law characterized by the density given in equation ([5]). In practice, such marginal distributions are available for 
both the bivariate and trivariate cases. The bivariate case, cf. |24, Eq. (30)] was used in a maximum likelihood 
classification algorithm for dual intensity SAR data in |25|. Hagedorn et al. | 26 | derived both the bi- and tri-variate 
X^ distributions of diagonal elements of a Wishart law, but there are currently no expressions available for the 
distances between these multivariate chi- squared distributions. 

Additionally, an increased number of looks and the amplitude format yield a distribution which can be approxi- 
mated by a multivariate Gaussian law. This, and the fact that multivariate Gaussian classifiers are a commodity of 
image processing software, suggests the use of the Gaussian model as a testbed for the data here considered. 

Theodoridis and Koutroumbas jSTl compute stochastic distances under the g'-variate Gaussian model. Using these 
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results and equation ([3]), we derived the Bhattacharyya test statistic for the null hypothesis Hq : (/l^i, Ei) = {fi2j ^2)' 



Smn 

= , — 



[{fl - ni f{^^^y\tM - A^)] +41og 



I S1+S2 



(11) 



where and X)^ are the maximum likelihood estimators of the mean vector and the covariance matrix, i = 1,2. 
The null hypothesis is rejected at level 1 — a if 'P^{Xq(^q-^3)/2 ^ ^b) ^ where Xq(g+3)/2 follows a distribution 
with q{q + 3)/2 degrees of freedom. 

IV. Region Classification based on Test Statistics 

In this section we define the two classification products we obtain using test statistics based on stochastic distances: 
mininum test statistics and p- value maps. 

Assume the image support is partitioned in r disjoint segments Ci, . . . , C^. The PolSAR data from each segment 
is denoted Z^., I < i < r, and a covariance matrix 5]^ is estimated with these data by maximum likelihood. 
The user provides k prototypes in the form of samples (supervised scheme), with which covariance matrices E^, 
1 < £ < /c, are estimated by maximum likelihood. The purpose is to classify each segment Ci in one of the k 
prototypes. 

Compute the r X /c test statistics which contrast the null hypothesis Hq : = Yli with one of the equations 
given in ([6])-([T0| for every segment 1 < i < r and every prototype 1 < £ < k. The classification based on minimum 
test statistic consists of assigning the segment Ci to the class represented by prototype t if 

S^{^,,^t) < S^{^,,^,) (12) 

for every t ^ i. Once the segment Ci has been assigned to the class represented by prototype t, the p-value of the 
assigment is computed as 

where u is the numbers of parameters of the considered model: u = q'^ for the Wishart distribution, and u = 
q{q-\-3)/2 for the g'-variate Gaussian distribution. This value gives an idea of the confidence of the decision. 

The rule given by inequality ([12]) opens a number of interesting alternatives, among them, instead of choosing one 
test statistic, use all available ones. Each test statistic will provide a class for each segment, and these classifications 
can be fused by majority vote. The information provided by equation ([13]) can also be used; a fuzzy classification 
can be made for each segment to all the classes whose p- value is above a certain threshold. 

V. Application to PolSAR Data 



The classification procedure described in Section IV was applied and evaluated under two approaches: using 



simulated data, which was generated under the complex Wishart distribution, and using a real SIR-C full PolSAR 
image, in L-band. 
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A. SIR-C Polarimetric Data Description 

The SIR-C full polarimetric image is from an agricultural area located in Petrolina city, Northeast of Brazil. 
Table |Il| presents the study area location and the basic characteristics of the SIR-C image. The main observed land 
cover classes are River, Caatinga, Prepared Soil, Soybean in three different phenological stages. Tillage, and Corn 



in two phenological stages. The training and test samples for these classes are shown in Figures 1(a) and 1(b) and 



their legends in Figure 1(c) These samples were properly sub-sampled to diminish the pixels correlation influence, 
and their final sizes are shown in Table [Till 



TABLE II 

SIR-C IMAGE AND STUDY AREA INFORMATION. 



Study area location 



Aquisition date 

Image size (pixels) 

Nominal number of looks 

Frequency 

Pixel spacing 

Incidence angle 

Orbit direction 



09° or S, 40° 18' W (central coordi- 
nate), about 40 km northeast of the city of 
Petrolina-PE, Brazil 
April 14*^, 1994 
407 X 370 
4.785 

L-band - 1.254 GHz 
12.5m X 12.5m 
49.496° 
Descending 



TABLE III 

TRAINING AND TEST SAMPLES DESCRIPTION. 



Class Description # Training samples # Test samples 



River 


Water body 


1192 


976 


Caatinga 


A stepped vegetation composed of stunted trees and thorny 
bushes, found in areas of little rainfall in Brazil 


1006 


820 


Prepared Soil 


Soil ready for seeding 


715 


442 


Soybean 1 


Soybean with approximately 52 days after seeding 


212 


99 


Soybean 2 


Soybean with approximately 66 days after seeding 


174 


117 


Soybean 3 


Soybean with approximately 113 days after seeding 


390 


216 


Tillage 


Agricultural crops residuals 


181 


98 


Corn 1 


Corn with less than 124 days after seeding 


661 


364 


Corn 2 


Corn with approximately 133 days after seeding 


191 


77 



Frery et al. f2l concluded that, with the exception of the class "River", the samples presented in Fig. [T] depart 
from the Wishart distribution ||2| page. 7, Table III] and are better explained by the Kp and Q% distributions. 
As previously mentioned, there are no analytic expressions for the stochastic distances between such generalized 
models, and numerical integration would be unfeasible due the need to integrate on the domain of all positive 
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(a) Training samples 



(b) Test samples 




River 

Soybean 1 
Tillage 




Caatinga 
Soybean 2 



Corn 1 




Prepared Soil 
Soybean 3 

Corn 2 



(c) Land cover classes legend 



Fig. 1. L-band SIR-C intensity images (HH(R), HV(G), VV(B)) and location of training and test samples. 

definite Hermitian matrices. In this manner, although the exact description of the data could be improved, adopting 
the Wishart model still leads to interesting results. 

B. Simulated Data Description 

Simulated data were generated under the symetric circularity assumption f5l. The simulation aims at obtaining 
random co variance matrices realizations under the complex Wishart distribution with a fixed number of looks (L). 
Initially, single-look polarimetric SAR data, represented by the g-variate complex Gaussian random vector y^, are 
generated. Assuming that yq follows a g-variate complex Gaussian distribution with zero mean and covariance 
matrix Ytq (denoted Uq ~ CA/'g(0, 5]^)), the simulation is performed by first sampling a 2g-variate vector x such 
that X2q ~ A/2g(0, ^2q)^ whcrc, under the symetric circular assumption and according to |j5| and | [28| , XIJ^ is such 
that: 



where J? and ^ denote the real and imaginary parts of a complex number, respectively. The first q elements of X2q 
become the real parts of the elements in the complex vector yq and the last q elements of X2q become the imaginary 
parts of the elements in the complex vector yq. This process is repeated as many times as the required number of 
samples, where each sample represents a polarimetric pixel of an image. The L-looks complex covariance matrix 
image is obtained according to equation (|4]). 
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The simulation process described above was used to produce images representing the nine classes observed in the 
SIR-C L-band PolSAR image. The covariance matrices for each class were the estimated covariance matrices, using 



the training samples presented in Figure 1(a) whose numbers of pixels are shown in Table \nJ\ These covariance 
matrices are presented in equations (p^-([22|) of the appendix. The simulation was performed with four looks and 
three polarization bands, HH, HV and VV. The simulated covariance matrix image of each class has 150 x 150 pixels. 
A final image was generated by mosaicking the simulated images of the individual classes. This final image has 
450 X 450 pixels, i.e., the images were grouped in a 3 x 3 images classes configuration. An RGB color composition 
of the intensities bands from the covariance matrix image is shown in Figure [2] 



River 


Caatinga 


Prepared 
Soil 


Soybean 1 


Soybean 2 Soybean 3 


, Tillage Com 1 


Corn 2 



(a) 



(b) 



Fig. 2. Simulated PolSAR image: (a) intensities color composition - HH(R), HV(G), VV(B) and (b) segmentation scheme in 15 x 15 pixels 
segments. 



The region classification procedure was applied using four different segmentation procedures to all the segments 
of sizes 5 X 5, 10 X 10, 15 x 15 and 30 x 30 pixels, respectively. The 15 x 15 segmented image is presented in 
Figure [2(b)l The prototype of each class, also needed for the classification procedure, was generated by sampling 900 
pixels, representing a training sample of 30 x 30 pixels for each class. The simulation of the prototypes, performed 
independently of the simulated image, ensures that identical data are not being considered in the computation of 
test statistics and, consequently, in the determination of the corresponding p- values. 

C. Assessing the Classification Procedure using Simulated Data 

The region-based classifications of the simulated data, using stochastic distances and the minimum test statistics 
given in equation ([12]), aimed at the evaluation of the classification procedure under rigorous well controlled statistical 
model, as the data was simulated considering the complex Wishart distribution. An additional classification result 
was obtained, considering the multivariate Gaussian model for the multivariate amplitude image obtained from 
the simulated PolSAR image. For this case, the analytic expression for the Bhattacharyya test statistic showed in 
equation ^H) was used. 
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Observing that we have four segmented images and six stochastic distances (five from Wishart and one from 
Gaussian models), twenty four classified images were produced. The classifications performed on segments of sizes 
10 X 10, 15 X 15 and 30 x 30 pixels were 100% correct. The classifications of segments of size 5x5 pixels reached 
a global accuracy of 99.81% for the Bhattacharyya, Kullback-Leibler, Hellinger and Renyi distances, and 99.58% 
for the distance. Errors occurred in segments belonging to the simulated classes of Soybean 2 and Corn 2. 

These results show the high quality of the proposed classifier when the assumptions of the data distribution are 
satisfied, especially for segments with large amounts of pixels (equal or greater than 100 pixels). 

Figure [3] shows the classified images using the six stochastic distances, for the case of 5 x 5 pixels segments. The 
slight confusion between the Soybean 2 and Corn 2 classes can be observed in this figure, as well as with some 
segments of the Caatinga class, classified as Corn 1 class under the distance. Under the Gaussian model, the 
global accuracy was 98.35% for the Bhattacharyya distance. A higher confusion between Soybean 2 and Corn 2 
classes was observed when compared with the results obtained by the classifiers that adopt the Wishart model, a 
result which stresses the importance of using the proper distribution to model the data. 

For each classification result, a map of the p- values of the test statistics, an indicator of the confidence of the 
assignment decision, was also produced. These results are presented in Figure |4j where the white positions mark 
those segments for which the null hypothesis (the equality between the covariance matrices of the segment and of 
the assigned prototype) was not rejected at the 5 % significance level. The percentages of these segments for each 



sample size and each distance is presented in Table IV 



TABLE IV 

Percentage of segments for which Hq was not rejected at 5 % significance level, for simulated data case. 







Percentage (%) 




Distances 


5x5 pixels 


10 X 10 pixels 


15 X 15 pixels 


30 X 30 pixels 




8100 segments 


2025 segments 


900 segments 


225 segments 


Bhattacharrya 


94.0 


95.2 


94.3 


93.8 


KuIIback-Leibler 


93.7 


95.1 


94.3 


93.3 


Hellinger 


95.2 


95.3 


94.8 


93.8 


Renyi (order /3 = 0.9) 


93.8 


95.1 


94.3 


93.8 




75.5 


91.2 


92.8 


92.4 


Bhattacharrya (Gaussian) 


90.6 


94.1 


95.1 


98.2 



The results presented in Figure |4] and in Table |IV] are compatible with the theoretically expected values. The 
hypothesis tests rejection rates were approximately 5% for all segmentation cases and stochastic distances, except 
when the x^ distance was used, and the Bhattacharrya Gaussian distance was applied to small (5x5 pixels) 
segments. The rejection rates for the x^ distance were higher than the theoretical values in all segmentation cases, 
reaching the value of approximately 24.5% for the segmentation of 5 x 5 pixels segments. The poor performance 
of the x^ distance test statistic was also observed by 1 17], where this big test size was first described. 
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(d) Renyi of order (3 (e) x (f) Bhattacharyya Gaussian 

^ River ^ Caatinga ^ Prepared Soil 

^ Soybean 1 ^ Soybean 2 ^ Soybean 3 

^1 Tillage Corn 1 ^| Corn 2 



Fig. 3. Classification results of the simulated data for segments of size 5x5 pixels. 



The rejection rate of the HelHnger distance is5%in5x5, 10x10 and 15 x 15 pixels segments, while the 
Bhattacharrya Gaussian distance reaches this rate only in large segments (15 x 15 and 30 x 30), as expected according 
to the Central Limit Theorem. 

D. Assessing the Classification Procedure using SIR-C Polarimetric Image 

Prior to classification, the SIR-C image was segmented using the SegS AR software 1 29 1 : a hierarchical multi-level 
region growing segmentation algorithm designed for intensity SAR data which uses tests based on the Gamma and 
Gaussian distributions. The SegSAR parameters used for segmentation were 100 pixels of minimum area, and 1 dB 
of similarity. 

The equivalent number of looks value was estimated considering all polarization channels using the method 
described in |2|, which is also referred by Anfisen et al. fT^ as Fractional Moment Estimate; the computed value 
was 2.97. The segmented image is presented in Figure |5(a)j each segment is shown in a color defined by associating 



the RGB channels to the means of each intensity polarization (HH, HV and VV). The classification procedure 
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described by equation ^V2\ was applied for L-band SIR-C data using this segmented image. The tests statistics are 
given in equations (|6|)-(p^, assuming the Wishart law, as well as equation (U) , assuming the Gaussian law for 
amplitude data. The p-value map, defined in equation ([13]), was computed for every segment in each classification. 

The region classifications were also compared to the contextual ICM polarimetric classification described in ||2|, 
which is also based on the Wishart distribution. Therefore, six region classifications and one ICM classification 
were obtained. The classification performances were compared using the estimated Kappa coefficient of agreement 
(k), and the overall accuracy, as formulated in jSOj. 

The classifications results are presented in Figure [5] The overall accuracy, the estimated Kappa coefficient of 
agreement and its variance for all classifications are presented in Table [V| The tests for equality of Kappa showed 
that the classifications based on Kullback-Leibler, Bhattacharrya, Hellinger and Renyi distances between Wishart 
distributions produced statistically similar results. The contextual classification is only superior to the distance 
classification, which is the worst stochastic distance-based classification, in agreement with the results found with 
simulated data. The classification based on the Bhattacharrya Gaussian distance is only superior to the contextual 
and to the distance classification. 



Table VI presents, for each stochastic distance, the percentage of segments with p-value greater than 0.05, i.e., 
the percentage of segments that were not rejected at this level. These segments are illustrated in white in Figure |6] 
Although the values of Table |V| showed promising results for minimum distance classification using complex 



Wishart distributions, the percentages showed in Table VI are far from the theoretical 95%. 



TABLE V 

Assessment of classification results for L-band SIR-C image. 
Classification Method L-band SIR-C Image 



Overall Accuracy (%) k s^(xlO ^) 







Bhattacharrya 


86.60 


0.8346 


1.253 






Kullback-Leibler 


86.60 


0.8346 


1.253 






Hellinger 


85.97 


0.8269 


1.296 


o 

<D 




Renyi (order /5 = 0.9) 


86.60 


0.8346 


1.253 








71.36 


0.6544 


2.081 






Bhattacharrya (Gaussian) 


85.35 


0.8191 


1.333 



Contextual ML/ICM - Wishart 83.97 0.8025 1.430 



VI. Conclusions and future work 

A new region-based classifier for PolSAR data using stochastic distances between complex Wishart distributions 
and derived hypothesis tests was presented. The proposed classifier was applied to simulated data and to a real 
L-band PolSAR image from the SIR-C mission. 
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TABLE VI 

Percentage of segments of the SIR-C L-band image that were not rejected at the 5 % significance level. 



Distances 


Percentage (%) 


Bhattacharrya 


9.76 


KuUback-Leibler 


9.58 


Hellinger 


10.49 


Renyi (order (3 = 0.9) 


9.58 




6.33 


Bhattacharrya (Gaussian) 


6.33 



The classification results using simulated PolSAR data based on the complex Wishart model obtained an overall 
acccuracy of 100%, with the exception of few misclassification observed when small segments (5x5 pixels) 
were used. The acceptance rates of the null hypothesis tests, which measures the confidence of the classification 
assignments, obtained very close values to the theoretically expected ones for almost all distances and segment sizes. 
The poorest results occurred when the distance was used, especially in the classifications of small segments. 
With the exception of this last case, the evidence allows us to conclude that the proposed classification method has a 
very good performance and confidence when the data rigorously follow the Wishart model. Further evaluations with 
non-perfect Wishart independent observations, such as under presence of noise, departures from the pure model, 
and spatial correlation are under implementation and investigation. 

The use of statistic based on the Hellinger distance between Wishart laws usually outperforms the results obtained 
by other distances, specially for segmented images with small regions. This may be due to the robustness which the 
procedures derived from this distance have. The Battacharrya Gaussian distance is also a good option for images 
having large segments. 

The proposed region-based classifier, when applied to L-band PolSAR data from the SIR-C mission, obtained 
also very good performance in terms of overall accuracy and k, coefficient of agreement. The best results were 
obtained with the Bhattacharrya, Kullback-Leibler, Renyi and Hellinger distances between Wishart distributions. 
The results using these distances overcame the classification results obtained using multivariate amplitude data 
and the Bhattacharrya distance between Gaussian laws. This evidence proves the relevance of using appropriate 
modeling of the data when employing stochastic distances. 

In comparison with a contextual Maximum Likelihood/ICM classifier |2|, the new classifier obtained also better 
results, with statistically superior k values. Such improvement can also be observed by examining the huge amount 
of undesirable small areas that still exist in the contextual result, while those artifacts are minimized by the region- 
based classification. 

The rejection rates of the null hypothesis tests concerning the real PolSAR data was distant from the theoretical 
expected values, achieving values higher than 90% . Since the results with complex Wishart simulated data were 
perfectly compatible with the theoretical expected significance level, this poor result with real data may be due 
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to a less than optimal description of the real data by the theoretical model. As mentioned before, many samples 
are better modeled by more general distributions as the Kp and Q^p laws. These results suggest that the proposed 
method is robust with respect to the classification map, but not the map of p-values. 

Another possible sources of misfit are spatial correlation, which alters the effective sample sizes in Eq. ([3]), the 
existence of more classes than those identified by the expert, the large size of some segments (as in the "River" 
class), and the influence of an inadequate segmentation (the SegSAR algorithm used in this paper was developed 
for intensity and not for PolSAR data). Further investigation must be taken forward with real data examples in 
order to clarify the possible vulnerability of hypothesis testing due to these possibilities. 

The analysis of the proposed region-based classification led us to conclude that the classifier has great potential 
for PolSAR data analysis. It is noteworthy that the expressions that have to be computed rely on simple operations 
on matrices: the determinant and the inverse. In the future, further investigation will be conducted using also 
the classifier module considering the intensity pair distribution for bivariate intensity data, a common SAR data 



availability situation commented in Section III 



Recent research | [3T| , p2| reports interesting results with the use of the Geodesic Rao metric fST]. This, and other 
tests statistics for hypothesis testing PolSAR data distributions, along with improved p4|-p6| and robust jSTj, p8| 
estimation in models which incorporate texture jTj, p9| are future lines of research. 



Appendix 



The covariance matrices of the nine classes of the SIR-C images were estimated by maximum likelihood using 
the selected training samples (Figure |l(a)| and Table [III]). Equations ([14]) to ([22]) present the estimated covariance 



matrices for the following classes: River, Caatinga, Prepared Soil, Soybean 1, Soybean 2, Soybean 3, Tillage, Corn 1, 
and Corn 2, respectively. These matrices were the parameter used for image simulation under the Wishart model, 
as described in section V-B Only the upper triangle and the diagonal are displayed in the equations ([14]) to ( [22j ) 



because the covariance matrix (E) is Hermitian and, therefore, the remaining elements are the complex conjugates. 
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